The aim of this report is to determine what factors of university life correlate the most with stress, to find out whether the time spent on each activity has a linear relationship with stress, and what the best way to reduce stress is. After designing a survey filled out by university students, we have attempted to analyze the responses with relevant graphs and show a correlation or relationship that can answer our research questions.
Our main discoveries are that there is not a very strong correlation between the time spent on each facet of university life and stress, which is unexpected. No strong conclusions can be drawn from our current data set, and as such, we believe that the reason for this is because individuals are too different for us to have a catch-all generalization across the board. More research should be done with a larger sample size, and potentially better designed survey questions, but that is beyond the scope of this report.
We kept our research questions in mind while designing our survey. To make the variable for time spent on each aspect of the participants’ university life continuous, we asked for a number instead of giving them options to choose from. We did the same for stress levels, but since we anticipated that participants would give integer values and therefore make the variable discrete, we attempted to force it to be continuous by asking them to provide a number from 1-100 instead of 1-10. In our graphs, we divided all the responses by 10, and shrunk the scale down to 1-10. That way, integer responses such as 14 would become a decimal like 1.4.
The participants were all university students, but not necessarily from the University of Sydney. Many of them were from Singapore and India. Although our sample size of 53 was respectable, introducing some extra factors like differences in culture, school, country, etc.. may have made the data more varied and contributed to the lack of a strong trend in our first research question.
The survey was made on Google Forms, and propagated by our group members sending the link. We believe that the only potential introduction of errors would be the participants not answering correctly or precisely, despite their best intentions, seeing as Google forms is a secure and reliable method of collecting answers.
Size of data
dim(data)
## [1] 53 17
R’s classification of data
class(data)
## [1] "data.frame"
R’s classification of variables
str(data)
## 'data.frame': 53 obs. of 17 variables:
## $ Lec_Rank : int 4 6 4 1 1 6 7 4 5 3 ...
## $ Soc/Sport_Rank : int 2 2 7 4 5 5 7 3 1 5 ...
## $ Club_Rank : int 2 1 2 2 4 2 7 7 2 6 ...
## $ Assignments_Rank : int 6 7 5 4 1 7 7 6 7 1 ...
## $ Commute_Rank : int 5 4 1 1 2 1 7 2 3 4 ...
## $ Fam_Rank : int 2 2 5 1 3 4 7 1 1 7 ...
## $ Less_Sleep_Rank : int 1 2 3 4 2 3 7 2 5 2 ...
## $ Lec_Time : num 5 8 8 8 0 10 2 17 12 10 ...
## $ Soc/Sport_Time : num 4 2 15 0 8 3 0 8 1 0 ...
## $ Club_Time : num 4 0 5 6 8 0 10 6 2 12 ...
## $ Assignment_Time : num 10 12 10 30 0 5 10 20 10 6 ...
## $ Commute_Time : num 5 8 8 2 5 1 5 4 5 6 ...
## $ Fam_Time : int 5 5 15 4 10 20 5 6 40 10 ...
## $ Sleep_Time : num 6 6 7 7 6 8 8 9 5.5 7 ...
## $ Reduce_Stress_Factors: Factor w/ 23 levels "Go for drinks, Reading novels or articles",..: 13 13 19 8 8 14 2 8 19 8 ...
## $ Stress_Level : int 70 65 70 60 30 14 60 20 65 70 ...
## $ Stress_Level_New : num 7 6.5 7 6 3 1.4 6 2 6.5 7 ...
Summary:
Our intial dataset had 17 columns, from the 10 questions in our survey.
All the “_Rank" columns were distinct quantitative variables, being how high each individual ranked them as a contribution to stress, with a maximum of 7 being the most contributing and 1 being the least.
All the “_Time" columns were continuous quantitative variables detailing the number of hours each participant spent on that activity per week, except for Sleep_Time which was number of hours per day.
“Reduce_Stress_Factors” simply detailed how many people did each of the activities to reduce their stress levels.
The “Stress_Level” columns were the number, individuals gave to describe how stressed they were.
All the variables in the R output matched what we would use them for.
While cleaning the data we identified non-serious responses which introduced great outliers and remove them and change the column names in R from the initial data set.
In our next research question, we decided to uncover what activities best reduced the stress levels of our participants. In our survey, we gave them a list of activities usually used to reduce stress and asked them to pick two of them.
We then made a bar plot of the average stress levels against the activities done.
#Calculating count of each stress reducing factor
Walk_Stress = data[grep('Go on a walk', data$Reduce_Stress_Factors), ]$Stress_Level_New
Sports_Stress = data[grep('Play some sports', data$Reduce_Stress_Factors), ]$Stress_Level_New
Drinks_Stress = data[grep('Go for drinks', data$Reduce_Stress_Factors), ]$Stress_Level_New
Novels_Stress = data[grep('Reading novels or articles', data$Reduce_Stress_Factors), ]$Stress_Level_New
TV_Stress = data[grep('Watching Television', data$Reduce_Stress_Factors), ]$Stress_Level_New
Comp_Games_Stress = data[grep('Playing Computer Games', data$Reduce_Stress_Factors), ]$Stress_Level_New
Sleep_Stress = data[grep('Sleep', data$Reduce_Stress_Factors), ]$Stress_Level_New
Fam_Stress = data[grep('Spend time with family and friends', data$Reduce_Stress_Factors), ]$Stress_Level_New
Smoke_Stress = data[grep('Smoke', data$Reduce_Stress_Factors), ]$Stress_Level_New
Youtube_Stress = data[grep('Youtube', data$Reduce_Stress_Factors), ]$Stress_Level_New
X_names = c("Drinks", "Comp-Games", "Reading", "Sports", "Smoke", "Sleep", "Family/Friends", "TV", "Walk", "Youtube")
#Averaging the Count of each stress reducing factor
Y_Values = c(mean(Walk_Stress), mean(Sports_Stress), mean(Drinks_Stress), mean(Novels_Stress), mean(TV_Stress), mean(Comp_Games_Stress), mean(Sleep_Stress), mean(Fam_Stress), mean(Smoke_Stress), mean(Youtube_Stress))
#Sorting the Stress levels
Y_Values = sort(Y_Values)
#Producing and Formatting the Graph
Reduce_Stress_Graph = plot_ly(data, x = ~X_names, y = ~Y_Values, type = 'bar', text = Y_Values, marker = list(color = 'rgb(158,202,225, 0.0)', line = list(color = 'rgb(8, 48, 107)', width = 1.2)))
Reduce_Stress_Graph = layout(Reduce_Stress_Graph, title = "Stress Level vs Reducing Factors", xaxis = list(title = "Stress Reducing Factors", categoryorder = "array", categoryarray = Y_Values), yaxis = list(title = "Average Stress Level (Out of 10)"), paper_bgcolor = 'rgba(245, 246, 249, 1)', plot_bgcolor = 'rgba(245, 246, 249, 1)')
Reduce_Stress_Graph
Smoking and watching YouTube videos both were filled out by one person each, so for our analysis we left them out. The most effective method of coping with stress seemed to be going out for drinks, followed by playing computer games, while the least effective one apart from watching YouTube videos was going on walks, followed by watching TV.
We have some concerns about possible confounding factors, like the personalities of people who use these activities. For example, perhaps people who go for drinks might care less about their university grades or have better support networks for dealing with stress compared to people who cope via watching TV or going out for walks. There may not be an even distribution across all the methods of dealing with stress, so some of them might have smaller samples than others. However, within the limits of this survey, dealing with these confounding factors is beyond the scope of this report.
Summary:
We preferred to refrain from drawing specific conclusions of whether certain methods are objectively better than others, simply because of the uniqueness of the students.
In the survey we asked the participants for their general stress level and asked them to rate the 7 factors contributing to that stress individually. (We used the same factors as Question 1- lectures, sports, partying, doing assignments, family time, commuting and sleeping).
The factors work in conjunction and are not independent of each other. This question is therefore focused on time management; investigating how students allocate their hours of the week to activities. This helped to increase the amount of usable data we received by eliminating the effects caused by having a smaller sample size with individual differences.
We found that if a student has to commute for multiple hours a week, they identified that commuting was a larger factor of stress within their life. This was exemplified through the positively skewed data in the bar plot.
#Calculating Avg Rank of each interval of Commute factor
`Commute_Avg_Rank` = c(
+ mean(data[which(data$`Commute_Time` < 4 & data$`Commute_Time` >= 0),]$`Commute_Rank`),
+ mean(data[which(data$`Commute_Time` < 8 & data$`Commute_Time` >= 4),]$`Commute_Rank`),
+ mean(data[which(data$`Commute_Time` < 12 & data$`Commute_Time` >= 8),]$`Commute_Rank`),
+ mean(data[which(data$`Commute_Time` < 16 & data$`Commute_Time` >= 12),]$`Commute_Rank`),
+ mean(data[which(data$`Commute_Time` <= 20 & data$`Commute_Time` >= 16),]$`Commute_Rank`))
#Producing and Fromatting the Graph
Commute_Rank_Graph = plot_ly(x = c("0-4 Hours", "4-8 Hours", "8-12 Hours", "12-16 Hours", "16-20 Hours"), y = `Commute_Avg_Rank`, type = "bar",text = `Commute_Avg_Rank`, marker = list(color = 'rgb(158,202,225)', line = list(color = 'rgb(8, 48, 107', width = 1.2)))
Commute_Rank_Graph = layout(Commute_Rank_Graph, title = "Rank vs Commute Hours", xaxis = list(title = "Hours in Commute per week", categoryorder = "array", categoryarray = c("0-4 Hours", "4-8 Hours", "8-12 Hours", "12-16 Hours", "16-20 Hours")), yaxis = list(title = "Average Rank (Out of 7)"), paper_bgcolor = 'rgba(245, 246, 249, 1)', plot_bgcolor = 'rgba(245, 246, 249, 1)')
Commute_Rank_Graph
This was simply because more time was spent in transit, which could have been utilized towards other activities; specifically sleep and study.
Again, correlating to Question 1, we found that if an individual receives less sleep per night they rank sleep higher as a cause of stress. This is reflected in the negatively skewed bar plot.
#Calculating Avg Rank of each interval of Sleep factor
Sleep_Avg_Rank = c(
+ mean(data[which(data$Sleep_Time < 5 & data$Sleep_Time >= 0),]$Less_Sleep_Rank),
+ mean(data[which(data$Sleep_Time < 6 & data$Sleep_Time >= 5),]$Less_Sleep_Rank),
+ mean(data[which(data$Sleep_Time < 7 & data$Sleep_Time >= 6),]$Less_Sleep_Rank),
+ mean(data[which(data$Sleep_Time < 8 & data$Sleep_Time >= 7),]$Less_Sleep_Rank),
+ mean(data[which(data$Sleep_Time < 9 & data$Sleep_Time >= 8),]$Less_Sleep_Rank),
+ mean(data[which(data$Sleep_Time < 10 & data$Sleep_Time >= 9),]$Less_Sleep_Rank))
#Producing and Formatting the Graph
Sleep_Rank_Graph = plot_ly(x = c("0-5 Hours", "5-6 Hours", "6-7 Hours", "7-8 Hours", "8-9 Hours","9-10 Hours"), y = Sleep_Avg_Rank, type = "bar", marker = list(color = 'rgb(158,202,225)', line = list(color = 'rgb(8, 48, 107', width = 1.2)))
Sleep_Rank_Graph = layout(Sleep_Rank_Graph, title = "Rank vs Sleep Hours", xaxis = list(title = "Hours of Sleep per day", categoryorder = "array", categoryarray = c("0-5 Hours", "5-6 Hours", "6-7 Hours", "7-8 Hours", "8-9 Hours","9-10 Hours")), yaxis = list(title = "Average Rank (Out of 7)"), paper_bgcolor = 'rgba(245, 246, 249, 1)', plot_bgcolor = 'rgba(245, 246, 249, 1)')
Sleep_Rank_Graph
We already know from Question 1 that people who get less sleep are more stressed overall. However, an analysis of this graph reveals that they also rank sleeping as a cause of this stress.
Stress is mostly a subjective response that varies between individuals, which made analyzing data difficult with a relatively small sample size. However, there has been an obvious trend throughout the whole survey; sleep is an integral factor of stress. We were even surprised at the weighting it had on stress levels, indicating that even a few extra hours a night really can benefit students tremendously.
Survey link
https://tinyurl.com/y45nulux
University of Leicester, 2019, Exam Stress, Accessed 17-04-2019 https://www2.le.ac.uk/offices/ld/resources/study/exam-stress
Hill, Curtis (2014) “School Stress, Academic Performance, and Coping in College Freshmen,” Ursidae: The Undergraduate Research Journal at the University of Northern Colorado: Vol. 4 : No. 2 , Article 9 https://digscholarship.unco.edu/cgi/viewcontent.cgi?article=1089&context=urj